counterfactual harm
- Europe > Austria > Vienna (0.14)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- Asia > Middle East > Jordan (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
Counterfactual harm
To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. However, to date there is no statistical method for measuring harm and factoring it into algorithmic decisions. In this paper we propose the first formal definition of harm and benefit using causal models. We show that any factual definition of harm is incapable of identifying harmful actions in certain scenarios, and show that standard machine learning algorithms that cannot perform counterfactual reasoning are guaranteed to pursue harmful policies following distributional shifts. We use our definition of harm to devise a framework for harm-averse decision making using counterfactual objective functions. We demonstrate this framework on the problem of identifying optimal drug doses using a dose-response model learned from randomised control trial data. We find that the standard method of selecting doses using treatment effects results in unnecessarily harmful doses, while our counterfactual approach identifies doses that are significantly less harmful without sacrificing efficacy.
Human-AI Collaborative Uncertainty Quantification
Noorani, Sima, Kiyani, Shayan, Pappas, George, Hassani, Hamed
AI predictive systems are increasingly embedded in decision making pipelines, shaping high stakes choices once made solely by humans. Yet robust decisions under uncertainty still rely on capabilities that current AI lacks: domain knowledge not captured by data, long horizon context, and reasoning grounded in the physical world. This gap has motivated growing efforts to design collaborative frameworks that combine the complementary strengths of humans and AI. This work advances this vision by identifying the fundamental principles of Human AI collaboration within uncertainty quantification, a key component of reliable decision making. We introduce Human AI Collaborative Uncertainty Quantification, a framework that formalizes how an AI model can refine a human expert's proposed prediction set with two goals: avoiding counterfactual harm, ensuring the AI does not degrade correct human judgments, and complementarity, enabling recovery of correct outcomes the human missed. At the population level, we show that the optimal collaborative prediction set follows an intuitive two threshold structure over a single score function, extending a classical result in conformal prediction. Building on this insight, we develop practical offline and online calibration algorithms with provable distribution free finite sample guarantees. The online method adapts to distribution shifts, including human behavior evolving through interaction with AI, a phenomenon we call Human to AI Adaptation. Experiments across image classification, regression, and text based medical decision making show that collaborative prediction sets consistently outperform either agent alone, achieving higher coverage and smaller set sizes across various conditions.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.87)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.87)
- Europe > Austria > Vienna (0.14)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- Asia > Middle East > Jordan (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
Appendices A
We give examples of computing the path-specific harm in Appendices B-D. Omission Problem: Alice decides not to give Bob a set of golf clubs. Therefore, according to the CCA, Alice's decision not to give Bob the'Bob given clubs', and outcome Whatever utility function describes Bob's preferences, the action Note there are other reasonable scenarios where Alice's actions would constitute harm. 'the clerk Alice harmed Bob by not giving him golf clubs'. For example, if Bob's utility is U ( y)= y (i.e. 1 for clubs, 0 for no clubs), then the harm caused by Alice is P ( Y A moment later, Eve would have robbed Bob of his clubs.
- Leisure & Entertainment (0.68)
- Health & Medicine > Therapeutic Area (0.67)
Counterfactual harm
To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. However, to date there is no statistical method for measuring harm and factoring it into algorithmic decisions. In this paper we propose the first formal definition of harm and benefit using causal models. We show that any factual definition of harm is incapable of identifying harmful actions in certain scenarios, and show that standard machine learning algorithms that cannot perform counterfactual reasoning are guaranteed to pursue harmful policies following distributional shifts. We use our definition of harm to devise a framework for harm-averse decision making using counterfactual objective functions.
Controlling Counterfactual Harm in Decision Support Systems Based on Prediction Sets
Straitouri, Eleni, Thejaswi, Suhas, Rodriguez, Manuel Gomez
Decision support systems based on prediction sets help humans solve multiclass classification tasks by narrowing down the set of potential label values to a subset of them, namely a prediction set, and asking them to always predict label values from the prediction sets. While this type of systems have been proven to be effective at improving the average accuracy of the predictions made by humans, by restricting human agency, they may cause harm$\unicode{x2014}$a human who has succeeded at predicting the ground-truth label of an instance on their own may have failed had they used these systems. In this paper, our goal is to control how frequently a decision support system based on prediction sets may cause harm, by design. To this end, we start by characterizing the above notion of harm using the theoretical framework of structural causal models. Then, we show that, under a natural, albeit unverifiable, monotonicity assumption, we can estimate how frequently a system may cause harm using only predictions made by humans on their own. Further, we also show that, under a weaker monotonicity assumption, which can be verified experimentally, we can bound how frequently a system may cause harm again using only predictions made by humans on their own. Building upon these assumptions, we introduce a computational framework to design decision support systems based on prediction sets that are guaranteed to cause harm less frequently than a user-specified value using conformal risk control. We validate our framework using real human predictions from two different human subject studies and show that, in decision support systems based on prediction sets, there is a trade-off between accuracy and counterfactual harm.
Do No Harm: A Counterfactual Approach to Safe Reinforcement Learning
Vaskov, Sean, Schwarting, Wilko, Baker, Chris L.
Reinforcement Learning (RL) for control has become increasingly popular due to its ability to learn rich feedback policies that take into account uncertainty and complex representations of the environment. When considering safety constraints, constrained optimization approaches, where agents are penalized for constraint violations, are commonly used. In such methods, if agents are initialized in, or must visit, states where constraint violation might be inevitable, it is unclear how much they should be penalized. We address this challenge by formulating a constraint on the counterfactual harm of the learned policy compared to a default, safe policy. In a philosophical sense this formulation only penalizes the learner for constraint violations that it caused; in a practical sense it maintains feasibility of the optimal control problem. We present simulation studies on a rover with uncertain road friction and a tractor-trailer parking environment that demonstrate our constraint formulation enables agents to learn safer policies than contemporary constrained RL methods.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)
Counterfactual harm
Richens, Jonathan G., Beard, Rory, Thompson, Daniel H.
To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. However, to date there is no statistical method for measuring harm and factoring it into algorithmic decisions. In this paper we propose the first formal definition of harm and benefit using causal models. We show that any factual definition of harm must violate basic intuitions in certain scenarios, and show that standard machine learning algorithms that cannot perform counterfactual reasoning are guaranteed to pursue harmful policies following distributional shifts. We use our definition of harm to devise a framework for harm-averse decision making using counterfactual objective functions. We demonstrate this framework on the problem of identifying optimal drug doses using a dose-response model learned from randomized control trial data. We find that the standard method of selecting doses using treatment effects results in unnecessarily harmful doses, while our counterfactual approach allows us to identify doses that are significantly less harmful without sacrificing efficacy.
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.87)
- Law (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Government (0.68)
- (2 more...)